Deterministic Motif Mining in Protein Databases
نویسندگان
چکیده
Protein sequence motifs describe, through means of enhanced regular expression syntax, regions of amino acids that have been conserved across several functionally related proteins. These regions may have an implication at the structural and functional level of the proteins. Sequence motif analysis can bring significant improvements towards a better understanding of the protein sequence-structure-function relation. In this chapter, we review the subject of mining deterministic motifs from protein sequence databases. We start by giving a formal definition of the different types of motifs and the respective specificities. Then, we explore the methods available to evaluate the quality and interest of such patterns. Examples of applications and motif repositories are described. We discuss the algorithmic aspects and different methodologies for motif extraction. A brief description on how sequence motifs can be used to extract structural level information patterns is also provided.
منابع مشابه
Data mining of protein families using common peptides
Predicting the function of a protein from its sequence is typically addressed using sequence-similarity. Here we propose a motif-based approach, using supervised motif extraction from protein sequences belonging to one functional family. The resulting deterministic motifs form Common Peptides (CPs) that characterize this family, allow for data mining of its proteins and facilitate further parti...
متن کاملDiscovering motif pairs at interaction sites from protein sequences on a proteome-wide scale
MOTIVATION Protein-protein interaction, mediated by protein interaction sites, is intrinsic to many functional processes in the cell. In this paper, we propose a novel method to discover patterns in protein interaction sites. We observed from protein interaction networks that there exist a kind of significant substructures called interacting protein group pairs, which exhibit an all-versus-all ...
متن کاملCpG Motif as an Adjuvant in Immunization of a Recombinant Plasmid Encoding Hepatitis C Virus Core Protein
متن کامل
Protein-Based Analysis of Alternative Splicing in the Human Genome
Understanding the functional significance of alternative splicing and other mechanisms that generate RNA transcript diversity is an important challenge facing modern-day molecular biology. Using homology-based, protein sequence analysis methods, it should be possible to investigate how transcript diversity impacts protein structure and function. To test this, a data mining technique ("DiffHit")...
متن کاملFast Motif Search in Protein Sequence Databases
Regular expression pattern matching is widely used in computational biology. Searching through a database of sequences for a motif (a simple regular expression), or its variations is an important interactive process which requires fast motif-matching algorithms. In this paper, we explore and evaluate various representations of the database of sequences using suffix trees for two types of query ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009